The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
Multi-view representation learning has developed rapidly over the past decades and has been applied in many fields. However, most previous works assumed that each view is complete and aligned. This leads to an inevitable deterioration in their performance when encountering practical problems such as missing or unaligned views. To address the challenge of representation learning on partially aligned multi-view data, we propose a new cross-view graph contrastive learning framework, which integrates multi-view information to align data and learn latent representations. Compared with current approaches, the proposed method has the following merits: (1) our model is an end-to-end framework that simultaneously performs view-specific representation learning via view-specific autoencoders and cluster-level data aligning by combining multi-view information with the cross-view graph contrastive learning; (2) it is easy to apply our model to explore information from three or more modalities/sources as the cross-view graph contrastive learning is devised. Extensive experiments conducted on several real datasets demonstrate the effectiveness of the proposed method on the clustering and classification tasks.
translated by 谷歌翻译
Recently, the dominant DETR-based approaches apply central-concept spatial prior to accelerate Transformer detector convergency. These methods gradually refine the reference points to the center of target objects and imbue object queries with the updated central reference information for spatially conditional attention. However, centralizing reference points may severely deteriorate queries' saliency and confuse detectors due to the indiscriminative spatial prior. To bridge the gap between the reference points of salient queries and Transformer detectors, we propose SAlient Point-based DETR (SAP-DETR) by treating object detection as a transformation from salient points to instance objects. In SAP-DETR, we explicitly initialize a query-specific reference point for each object query, gradually aggregate them into an instance object, and then predict the distance from each side of the bounding box to these points. By rapidly attending to query-specific reference region and other conditional extreme regions from the image features, SAP-DETR can effectively bridge the gap between the salient point and the query-based Transformer detector with a significant convergency speed. Our extensive experiments have demonstrated that SAP-DETR achieves 1.4 times convergency speed with competitive performance. Under the standard training scheme, SAP-DETR stably promotes the SOTA approaches by 1.0 AP. Based on ResNet-DC-101, SAP-DETR achieves 46.9 AP.
translated by 谷歌翻译
Models should be able to adapt to unseen data during test-time to avoid performance drops caused by inevitable distribution shifts in real-world deployment scenarios. In this work, we tackle the practical yet challenging test-time adaptation (TTA) problem, where a model adapts to the target domain without accessing the source data. We propose a simple recipe called \textit{Data-efficient Prompt Tuning} (DePT) with two key ingredients. First, DePT plugs visual prompts into the vision Transformer and only tunes these source-initialized prompts during adaptation. We find such parameter-efficient finetuning can efficiently adapt the model representation to the target domain without overfitting to the noise in the learning objective. Second, DePT bootstraps the source representation to the target domain by memory bank-based online pseudo-labeling. A hierarchical self-supervised regularization specially designed for prompts is jointly optimized to alleviate error accumulation during self-training. With much fewer tunable parameters, DePT demonstrates not only state-of-the-art performance on major adaptation benchmarks VisDA-C, ImageNet-C, and DomainNet-126, but also superior data efficiency, i.e., adaptation with only 1\% or 10\% data without much performance degradation compared to 100\% data. In addition, DePT is also versatile to be extended to online or multi-source TTA settings.
translated by 谷歌翻译
高光谱成像技术(HSI)在远程分布光谱波长上记录了视觉信息。代表性的高光谱图像采集程序通过编码的光圈快照光谱成像器(CASSI)进行了3D到2D的编码,并且需要用于3D信号重建的软件解码器。基于此编码程序,两个主要挑战妨碍了高保真重建的方式:(i)获得2D测量值,CASSI通过分散器触觉并将其挤压到同一空间区域,从。 (ii)物理编码的光圈(掩码)将通过选择性阻止像素的光曝光来导致掩盖数据丢失。为了应对这些挑战,我们提出了具有面膜感知的学习策略的空间光谱(S2-)变压器体系结构。首先,我们同时利用空间和光谱注意模型来沿两个维度划分2D测量中的混合信息。空间和光谱线索跨的一系列变压器结构是系统设计的,它考虑了两倍提示之间的信息相互依赖性。其次,蒙面的像素将引起更高的预测难度,应与未掩盖的像素不同。因此,我们通过推断出对蒙版意识预测的难度级别来适应归因于面具结构的损失惩罚。我们提出的方法不仅定量设置了新的最新方法,而且在结构化区域中产生了更好的感知质量。
translated by 谷歌翻译
语义本地化(SELO)是指使用语义信息(例如文本)在大规模遥感(RS)图像中获得最相关位置的任务。作为基于跨模式检索的新兴任务,Selo仅使用字幕级注释来实现语义级检索,这表明了其在统一下游任务方面的巨大潜力。尽管Selo已连续执行,但目前没有系统地探索并分析了这一紧急方向。在本文中,我们彻底研究了这一领域,并根据指标和测试数据提供了完整的基准,以推进SELO任务。首先,基于此任务的特征,我们提出了多个判别评估指标来量化SELO任务的性能。设计的显着面积比例,注意力转移距离和离散的注意距离可用于评估从像素级别和区域级别中产生的SELO图。接下来,为了为SELO任务提供标准评估数据,我们为多样化的,多语义的,多目标语义定位测试集(AIR-SLT)贡献。 AIR-SLT由22个大型RS图像和59个具有不同语义的测试用例组成,旨在为检索模型提供全面的评估。最后,我们详细分析了RS跨模式检索模型的SELO性能,探索不同变量对此任务的影响,并为SELO任务提供了完整的基准测试。我们还建立了一个新的范式来引用RS表达理解,并通过将其与检测和道路提取等任务相结合,证明了Selo在语义中的巨大优势。拟议的评估指标,语义本地化测试集和相应的脚本已在github.com/xiaoyuan1996/semanticlocalizationmetrics上访问。
translated by 谷歌翻译
感觉到航天器的三维(3D)结构是成功执行许多轨道空间任务的先决条件,并且可以为许多下游视觉算法提供关键的输入。在本文中,我们建议使用光检测和范围传感器(LIDAR)和单眼相机感知航天器的3D结构。为此,提出了航天器深度完成网络(SDCNET),以根据灰色图像和稀疏深度图回收密集的深度图。具体而言,SDCNET将对象级航天器的深度完成任务分解为前景分割子任务和前景深度完成子任务,该任务首先将航天器区域划分,然后在段前景区域执行深度完成。这样,有效地避免了对前景航天器深度完成的背景干扰。此外,还提出了一个基于注意力的特征融合模块,以汇总不同输入之间的互补信息,该信息可以按顺序推论沿通道沿着不同特征和空间维度之间的相关性。此外,还提出了四个指标来评估对象级的深度完成性能,这可以更直观地反映航天器深度完成结果的质量。最后,构建了一个大规模的卫星深度完成数据集,用于培训和测试航天器深度完成算法。数据集上的经验实验证明了拟议的SDCNET的有效性,该SDCNET达到了0.25亿的平均绝对误差和0.759m的平均绝对截断误差,并通过较大的边缘超过了前期方法。航天器姿势估计实验也基于深度完成结果进行,实验结果表明,预测的密集深度图可以满足下游视觉任务的需求。
translated by 谷歌翻译
3D场景感性风格化旨在根据给定的样式图像从任意新颖的视图中生成光真逼真的图像,同时在从不同观点呈现时确保一致性。一些带有神经辐射场的现有风格化方法可以通过将样式图像的特征与多视图图像结合到训练3D场景来有效地预测风格化的场景。但是,这些方法生成了包含令人反感的伪影的新型视图图像。此外,他们无法为3D场景实现普遍的影迷风格化。因此,样式图像必须根据神经辐射场重新训练3D场景表示网络。我们提出了一个新颖的3D场景,逼真的风格转移框架来解决这些问题。它可以通过2D样式图像实现感性3D场景样式转移。我们首先预先训练了2D逼真的样式传输网络,该网络可以符合任何给定内容图像和样式图像之间的影片风格转移。然后,我们使用体素特征来优化3D场景并获得场景的几何表示。最后,我们共同优化了一个超级网络,以实现场景的逼真风格传输的任意样式图像。在转移阶段,我们使用预先训练的2D影视网络来限制3D场景中不同视图和不同样式图像的感性风格。实验结果表明,我们的方法不仅实现了任意样式图像的3D影像风格转移,而且还优于视觉质量和一致性方面的现有方法。项目页面:https://semchan.github.io/upst_nerf。
translated by 谷歌翻译
图神经网络(GNN)在图形分类和多样化的下游现实世界应用方面取得了巨大成功。尽管他们成功了,但现有的方法要么仅限于结构攻击,要么仅限于本地信息。这要求在图形分类上建立更一般的攻击框架,由于使用全球图表级信息生成本地节点级的对抗示例的复杂性,因此面临重大挑战。为了解决这个“全局到本地”问题,我们提出了一个通用框架CAMA,以通过层次样式操纵图形结构和节点特征来生成对抗性示例。具体而言,我们利用Graph类激活映射及其变体来产​​生与图形分类任务相对应的节点级的重要性。然后,通过算法的启发式设计,我们可以借助节点级别和子图级的重要性在不明显的扰动预算下执行功能和结构攻击。在六个现实世界基准上攻击四个最先进的图形分类模型的实验验证了我们框架的灵活性和有效性。
translated by 谷歌翻译
作为全球发病率的主要原因,肠道寄生虫感染仍然缺乏节省时间,高敏性和用户友好的检查方法。深度学习技术的发展揭示了其在生物形象中的广泛应用潜力。在本文中,我们应用了几个对象探测器,例如yolov5和变体cascadercnns,以自动区分显微镜图像中的寄生卵。通过专门设计的优化,包括原始数据增强,模型集合,传输学习和测试时间扩展,我们的模型在挑战数据集上实现了出色的性能。此外,我们的模型接受了增加的噪声训练,可以提高污染输入的较高鲁棒性,从而进一步扩大了其实践中的适用性。
translated by 谷歌翻译